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DETAILED ACTION 
Response to Amendment 

1 . In response to the Office Action mailed June 21 , 2004, applicant submitted an 
amendment filed on September 23, 2004, in which the applicant amended claims 1, 4, 
12 and 19 to overcome the examiner's rejections. 

Response to Arguments 

2. Applicant traverses the rejection under 1 12 first paragraph that the independent 
claim 1 is insufficiently supported by the specification to enable a person of ordinary skill 
to make and use the invention, by particularly pointing out the amended page 12 and 
figure 6 as amended. 

Applicant amended claim 4 to overcome the 1 12 2 nd paragraph rejection and to 
remove the potential contradiction between claims 1 and 4. 

Applicant also amended the claim 1 to include the limitation of the objected claim 
6, which would have been allowable if rewritten in independent for including all of the 
limitations of the base claim and any intervening claims. 

However, applicant's arguments with respect to claims 1,12 and 19 have been 
considered but are moot in view of the new ground(s) of rejection. 
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Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 1-4, 12-14 and 19 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Bordeaux (USPN 5,757,023) in view of Shu (USPN 6,016,470), and 
in further of Karaali et al. (USPN 5,930,754), hereinafter referenced as Karaali. 

Regarding claim 1, Bordeaux discloses "retrieving from storage neural network 
parameters, weights and dictionaries for the appropriate language" (col. 5, lines 26-30). 
Inherently, neural network must know the phonological/phonetic units associated with 
the language and its variations, in order to properly identify phonemes and ailophones 
(col. 5, lines 38-41). 

Bordeaux does not disclose "developing a maximal set based on said defined 
phonological units, phonetic units, and identified variations in said language, and 
reducing said maximal set to a minimal set of phonemes and ailophones wherein said 
reducing said maximal set further comprises reducing a text-to-speech phonetics set, 
which further comprises removing one of said phonological units, phonetic units and 
identified variations is said language, thereby providing for a compact model for 
acoustically transcribing said language". 
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Shu teaches a method of building a rejection grammar, which originally starts 
with a full set of phonemes of a given language and then gradually shrinks the set until 
sufficient accuracy is achieved using the smallest number of phoneme models, (fig 5, 
col. 8, lines 6- 10) 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the system disclosed by Bordeaux to follow up the process of 
training the neural network with a reduction process taught by Shu. This would 
significantly improve the efficiency of the system because duplicate and unnecessary 
phoneme entries would be removed from the system, thus improving the speed of 
operation and reducing memory requirements for the neutral network. 

Bordeaux in view of Shu discloses a method for transcribing a language 
acoustically based on well-defined basic phonetics, however lacks reducing said 
maximal set to a minimal set of phonemes and allophones wherein said reducing said 
maximal set further comprises reducing a text-to-speech phonetics set, which further 
comprises removing one of said phonological units, phonetic units and identified 
variations is said language. 

Karaali discloses a method for neural network based orthography-phonetics 
transformation wherein the orthography-pronunciation lexicon of text to speech system 
is reduced in size (column 21, lines 49-60) in a language like English (column 1, lines 
25-40), in order to produce the most accurate phonetic representation. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Bordeaux in view of Shu's method such that it 
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reduces said maximal set to a minimal set of phonemes and allophones wherein said 
reducing said maximal set further comprises reducing a text-to-speech phonetics set, 
which further comprises removing one of said phonological units, phonetic units and 
identified variations is said language, so that the storage requirements would not 
exceed the feasibility of most applications. 

Regarding claim 2, Bordeaux discloses "retrieving from storage neural network 
parameters, weights and dictionaries for the appropriate language" (col . 5, lines 26-30). 

Bordeaux does not disclose "a step of extracting information that further 
comprises: identifying terminological problems associated with said language, 
identifying transcription problems associated with said language, extracting all 
phonological and phonetic units associated with said language, and selecting a 
representative symbol for the transcription alphabet". 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art that the "dictionaries for the appropriate language" could contain 
additional terminological and transcription information about the language, as well as full 
phonological/phonetic alphabets for that language. This would allow the system to keep 
all information pertinent to recognition of a specific language in a logically separate data 
unit, such as dictionary. 

Regarding claim 3, Bordeaux does not disclose "maximal set (that) comprises 
any of, or a combination of: phonemes, allophones, rules governing the selection of 
allophones, a set of examples, and transliteration symbols". 
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Shu teaches a method of building a rejection grammar, which originally starts 
with a maximum (full) set of phonemes of a given language and then gradually shrinks 
the set until sufficient accuracy is achieved using the smallest number of phoneme 
models, (fig 5, col. 8, lines 6-10) 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the system disclosed by Bordeaux to train the neural network to 
initially contain a full set of phoneme models, as taught by Shu. This would allow the 
system to learn all the necessary phonemes for language recognition, even at the cost 
of some unnecessary phoneme duplication. These duplicate phonemes are removed in 
the "reduction" step that follows the creation of maximal set. 

Regarding claim 4, Bordeaux discloses the training of the neutral network using 
examples for each of the desired phones for a specific language (col. 10, lines 20-23). 

Bordeaux does not disclose "a said step of reducing said maximal set further 
comprises reducing an automatic speech recognition phonetic set." 

Shu teaches a method of building a rejection grammar, which originally starts 
with a full set of phonemes of a given language and then gradually shrinks the set until 
sufficient accuracy is achieved using the smallest number of phoneme models, (fig 5, 
col. 8, lines 6-10). 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the system disclosed by Bordeaux to follow up the process of 
training the neural network with a reduction process taught by Shu. This would 
significantly improve the efficiency of the system because duplicate and unnecessary 
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phoneme entries would be removed from the ASR set of the system, thus improving the 
speed of operation and reducing memory requirements for the neutral network. 

Regarding claim 12, Bordeaux discloses Microphone (col. 5, line 66), computer 
system (col. 13, lines 34-36), and Medium Vision Pro Audio Spectrum card (col . 13, 
lines 56-57). The computer system and accompanying software performs speech 
analysis and voice-to-text translation (Col.1 3, lines 57-60) 

Bordeaux does not disclose using compact set of phonetic alphabets for voice-to- 
text system including a reduced text-to-speech phonetics set from which one of a 
phonological unit, a phonetic unit and an identified variation in said language has been 
removed. 

Shu teaches a method of constructing a smaller set of phoneme models (fig 5, 
col. 8, lines 6- 10). 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the system disclosed by Bordeaux to follow up the process of 
training the neural network with a reduction process taught by Shu. This would 
significantly improve the efficiency of the system because duplicate and unnecessary 
phoneme entries would be removed the system, thus improving the speed of operation 
and reducing memory requirements for the neutral network. 

Bordeaux in view of Shu discloses a method for transcribing a language 

i 

acoustically based on well-defined basic phonetics, however lacks including a reduced 
text-to-speech phonetics set from which one of a phonological unit, a phonetic unit and 
an identified variation in said language has been removed. 
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Karaali discloses a method for neural network based orthography-phonetics 
transformation wherein the orthography-pronunciation lexicon of text to speech system 
is reduced in size (column 21, lines 49-60) in a language like English (column 1, lines 
25-40), in order to produce the most accurate phonetic representation. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Bordeaux in view of Shu's method such that 
includes reduced text-to-speech phonetics set from which one of a phonological unit, a 
phonetic unit and an identified variation in said language has been removed, so that the 
storage requirements would not exceed the feasibility of most applications. 

Regarding claim 13, Bordeaux discloses ASR (voice-to-text translation, fig. 1, 
input to element 3, output of element 7) 

Regarding claim 14, Bordeaux discloses an ASR system that is speaker- 
independent (col. 13, line 63). 

Regarding claim 19, Bordeaux discloses a Microphone (col. 5, line 66), 
computer system (col . 13, lines 34-36), and Medium Vision Pro Audio Spectrum card 
(col. 13, lines 56-57). The computer system stores language dictionaries (9, fig. 1) and 
accompanying software performs speech analysis and voice-to-text translation (col. 13, 
lines 57-60). 

Bordeaux does not disclose using compact set of phonetic alphabets for voice-to- 
text system including a reduced text-to-speech phonetics set from which one of a 
phonological unit, a phonetic unit and an identified variation in said language has been 
removed. 
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Shu teaches a method of constructing a smaller set of phoneme models (fig 5, 
col. 8, lines 6- 10). 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the system disclosed by Bordeaux to follow up the process of 
training the neural network with a reduction process taught by Shu. This would 
significantly improve the efficiency of the system because duplicate and unnecessary 
phoneme entries would be removed from the system, thus improving the speed of 
operation and reducing memory requirements for the neutral network. 

Bordeaux in view of Shu discloses a method for transcribing a language 
acoustically based on welt-defined basic phonetics, however lacks including a reduced 
text-to-speech phonetics set from which one of a phonological unit, a phonetic unit and 
an identified variation in said language has been removed. 

Karaali discloses a method for neural network based orthography-phonetics 
transformation wherein the orthography-pronunciation lexicon of text to speech system 
is reduced in size (column 21 , lines 49-60) in a language like English (column 1 , lines 
25-40), in order to produce the most accurate phonetic representation. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Bordeaux in view of Shu's method such that 
includes reduced text-to-speech phonetics set from which one of a phonological unit, a 
phonetic unit and an identified variation in said language has been removed, so that the 
storage requirements would not exceed the feasibility of most applications. 
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5. Claim 5 is rejected under 35 U.S.C. 103(a) as being obvious over Bordeaux in 
combination with Shu and Karaali, as applied to claim 4, and further in view of Selounai 
("Recognition of Arabic Phonetic Features Using Neural Networks and Knowledge- 
Based System: a Comparative Study"). 

Regarding claim 5 f Bordeaux discloses a phone identifier (5, fig. 1 ) that is trained 
to recognize phonemes and all legitimate speech sounds in a language including such 
sounds as murmurs, and allophones (col. 8, lines 8-15). 

Bordeaux in combination with Shu and Karaali do not disclose that the step of 
reducing an automatic speech recognition set further comprises the use of diacritics, 
graphemes, and allophones. 

Selounai teaches the use of diacritics and graphemes as part of the Arabic 
phonetic alphabet (page 408, right column, lines 50-55). 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the reduction of language set disclosed by Bordeaux in 
combination with Shu and Karaali to include diacritics and graphemes because this 
would allow Bordeaux f s system to handle standard Arabic utterances for speech-to-text 
translation, in order to improve linguistic capabilities of the system. 
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6. Claim 15 is rejected 35 U.S.C. 103(a) as being obvious over Bordeaux, as 
applied to claim 14, in combination with Shu and Karaali, and further in view of Neti et 
al. (USPN 5,953,701), hereinafter referenced as Neti. 

Regarding claim 15, Bordeaux in combination with Shu and Karaali do not 
disclose a system that is speaker dependant on gender or age. 

However, Neti et al. teaches gender-dependant speech recognition (Abstract). 

Therefore, it would have been obvious to a person of ordinary skill in the art to 
modify system disclosed by Bordeaux in combination with Shu and Karaali to use 
gender-dependant speech recognition, as taught by Neti, because it would enable it to 
perform better in contexts where speech recognition of specific gender was desirable. 

7. Claim 8 is rejected under 35 U.S.C. 103(a) as being obvious over Bordeaux in 
combination with Shu and Karaali ,as applied lo claim 1 , and further in view of Buth et 
al. (USPN 6,546,369), hereinafter referenced as Buth. 

Regarding claim 8, Bordeaux in combination with Shu and Karaali do not 
disclose the use of International Phonetics Alphabet (IPA) for transcribing the language. 
Bueth et al. teaches the use of VA to transcribe the language (col. 1 , lines 63- 

65). 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the system disclosed by Bordeaux in combination with Shu and 
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Karaali to use PA for transcribing. This would enable the system to transcribe using 
internationally accepted standard, hence making the system portable to a multitude of 
languages. 

8. Claim 9 is rejected under 35 U.S.C. 103(a) as being obvious over Bordeaux in 
combination with Shu and Karaali, as applied to claim 1 , an further in view of Selounai. 

Regarding claim 9, Bordeaux discloses that his system supports a multitude of 
pre-determined languages using a neural network (col. 5, lines 20-27). 

Bordeaux in combination with Shu and Karaali do not disclose that one of these 
languages is modem Arabic, classical Arabic or colloquial Arabic. 

Selounai teaches the use of neural networks f or automatic recogniition of Arabic 
language (Abstract). 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art that the neural networks in the system disclosed by Bordeaux in 
combination with Shu and Karaali could use an approach taught by Selounai. This 
modification of the system would allow it to support Arabic among other pre-determined 
languages in order to increase linguistic capability of the system. 
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9. Claims 10-11 are rejected under 35 U.S.C. 103(a) as being obvious over 
Bordeaux in combination with Shu and Karaali, as applied to claim 1 ( and further in view 
of Jeppesen (USPN 6,490,557). 

Regarding claim 10, Bordeaux discloses the use of computer the ASR system 
(col. 13, lines 30-60). 

Bordeaux in combination with Shu and Karaali do not disclose downloading 
phonetic information over a network. 

Jeppesen teaches the use of Internet with a central ASR system (23, fig. 3). 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to centralize the ASR system to use Internet as taught by Jeppesen, 
because it would allow the move the storage of phonetic information to a server and 
hence reduce the amount of information stored on the client computer. In addition, it 
would simplify the update of phoneme information to the ASR computers in the 
networked environment. 

Regarding claim 11, Bordeaux in combination with Shu and Karaali do not teach 
downloading phonetic information over WAN, LAN, Internet and HTTP-based networks. 

However, it would have been obvious to a person of ordinary skill in the art that 
use of Internet inherently embodies the use of WAN, LAN, wireless and other types of 
networks. 
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10. Claims 16 and 18 are rejected under 35 U.S.C. 103(a) as being obvious over 
Bordeaux in combination with Shu and Karaali, as applied to claim 12, and further in 
view of Selounai. 

Regarding claim 16, Bordeaux discloses a phone identifier (5, fig. 1) that is 
trained to recognize phonemes and all legitimate speech sounds in a language 
including such sounds as murmurs, and allophones (col. 8, lines 8-15). 

Bordeaux in combination with Shu and Karaali do not disclose the use of 
diacritics and graphemes in the phonetic alphabet. 

Selounai teaches the use of diacritics and graphemes as part of the Arabic 
phonetic alphabet (page 408, right col. lines 50-55). 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the language set disclosed by Bordeaux in combination with 
Shu and Karaali to include diacritics and graphemes because this would allow the 
system to handle standard Arabic utterances for speech-to-text translation, in order to 
improve linguistic capabilities of the system. 

Regarding claim 18, Bordeaux discloses that his system supports a multitude of 
pre-determined languages using a neural network (col. 5, lines 20-27). 

Bordeaux in combination with Shu and Karaali do not disclose that one of these 
languages is modern Arabic, classical Arabic or colloquial Arabic. 

Selounai teaches the use of neural networks for automatic recognition of Arabic 
language (abstract). 
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At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art that the neural networks in the system disclosed by Bordeaux in 
combination with Shu and Karaali could use an approach taught by Selounai. This 
modification of the system would allow it to support Arabic among other pre-determined 
languages. 

1 1 . Claim 17 is rejected under 35 U.S.C. 103(a) as being obvious over Bordeaux in 
combination with Shu and Karaali, as applied to claim 12, and further in view of Buth. 

Regarding claim 17, Bordeaux in combination with Shu and Karaali do not 
disclose the use of Ipternational Phonetics Alphabet (1 PA) for transcribing the language. 
Bueth et al. teaches the use of IPA to transcribe the language (col. 1 , lines 63- 

65). 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the ASR system disclosed by Bordeaux in combination with Shu 
and Karaali to use PA for transcribing. This would enable the system to transcribe using 
internationally accepted standard, hence making the system portable to a multitude of 
languages. 
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12. Claim 20 is rejected under 35 U.S.C. 103(a) as being obvious over Bordeaux in 
combination with Shu and Karaali, as applied to claim 19, and further in view of 
Selounai. 

Regarding claim 20, Bordeaux discloses a phone identifier (5, fig. 1) that is 
trained to recognize phonemes and all legitimate speech sounds in a language 
including such sounds as murmurs, and allophones. (col. 8, lines 8-15) 

Bordeaux in combination with Shu and Karaali do not disclose the use of 
diacritics and graphemes in the phonetic alphabet. 

Selounai teaches the use. of diacritics and graphemes as part of the Arabic 
phonetic alphabet (page 408, right col., lines 50-55) 

At the time of the invention, it would have been obvious to a person of ordinary 
skill in the art to modify the language set disclosed by Bordeaux in combination with 
Shu and Karaali to include diacritics and graphemes because this would allow the 
system to handle standard Arabic utterances for speech-to-text translation in order to 
improve linguistic capabilities of the system. 
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Allowable Subject Matter 



13. Claim 7 is objected to as being dependent upon a rejected base claim 1 , but 
would be allowable if rewritten in independent form including all of the limitations of the 
base claim and any intervening claims. 

The following is a statement of reasons for the indication of allowable subject 
matter: the prior art of record does not teach nor fairly suggest the combination of 
elements including reduction of text-to-speech phonetics set using allophones and 
adding symbols representing the phoneme to be geminated. 



Conclusion 

14. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

• Sirivara (US Patent Publication No. 2002/0143543) discloses compressing & 
using a concatenate speech database in text-to-speech systems. 
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15. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Jakieda R Jackson whose telephone number is 
703.305.5593. The examiner can normally be reached on Monday through Friday from 
7:30 a.m. to 5:00p.m. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached on 703. 305.4827. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

JRJ 

February 15, 2005 
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